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Abstract. Stochastic evolution of Chemical Reactions Networks (CRNs) 
over time is usually analysed through solving the Chemical Master Equa¬ 
tion (CME) or performing extensive simulations. Analysing stochasticity 
is often needed, particularly when some molecules occur in low numbers. 
Unfortunately, both approaches become infeasible if the system is com¬ 
plex and/or it cannot be ensured that initial populations are small. We 
develop a probabilistic logic for CRNs that enables stochastic analysis of 
the evolution of populations of molecular species. We present an approx¬ 
imate model checking algorithm based on the Linear Noise Approxima¬ 
tion (LNA) of the CME, whose computational complexity is independent 
of the population size of each species and polynomial in the number of 
different species. The algorithm requires the solution of first order poly¬ 
nomial differential equations. We prove that our approach is valid for 
any CRN close enough to the thermodynamical limit. However, we show 
on four case studies that it can still provide good approximation even for 
low molecule counts. Our approach enables rigorous analysis of CRNs 
that are not analyzable by solving the CME, but are far from the deter¬ 
ministic limit. Moreover, it can be used for a fast approximate stochastic 
characterization of a CRN. 


1 Introduction 

Chemical reaction networks (CRNs) and mass action kinetics are well studied for¬ 
malisms for modelling biochemical systems [9]. In recent years, CRNs have also 
been successfully used as a formal programming language for biochemical sys¬ 
tems |:-;;il7ll()| . There are two well established approaches for analyzing chemical 
networks: deterministic and stochastic |20] . The deterministic approach models 
the kinetics of a CRN as a system of ordinary differential equations (ODEs) 
and represents average behaviour, valid in the thermodynamic limit |19j . The 
stochastic approach, on the other hand, is based on the Chemical Master Equa¬ 
tion (CME) and models the CRN as a continuous-time Markov chain (CTMC) 
[^. The stochastic behavior can be analyzed by stochastic simulation or by 
exhaustive probabilistic model checking of the CTMC, which can be performed, 
for example, by using PRISM |25) . 
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Exhaustive analysis of the CTMC is able to find the best- and worst-case 
scenarios and is correct for any population size, but suffers from the state-space 
explosion problem [26] and can only be used for relatively small systems. In 
contrast, deterministic methods are much more robust with respect to state- 
space explosion, but unable to represent stochastic fluctuations, which play a 
fundamental role when the system is not in thermodynamic equilibrium. 

Contributions. In this paper we develop a novel approach for analysing 
the stochastic evolution of a CRN based on the Linear Noise Approximation 
(LNA) of the CME. We formulate SEL (Stochastic Evolution Logic), a prob¬ 
abilistic logic for CRNs that enables reasoning about probability, expectation 
and variance of linear combinations of populations of the species. Examples of 
properties that can be specified in our logic are shown in Example [T] We pro¬ 
pose an approximate model checking algorithm for the logic based on the LNA 
and implement it in Matlab and Java. We demonstrate that the complexity of 
model checking is polynomial in the initial number of species and independent of 
the initial molecule counts, thus ameliorating state-space explosion. Further, we 
show that model checking is exact when approaching the thermodynamic limit. 
Though the algorithm may not be accurate for systems far from the deterministic 
limit, this generally happens when the populations are small, in which case the 
analysis can be performed by transient analysis of the induced CTMC [23] . Our 
approach is essential for CRNs that cannot be analyzed by (partial) state space 
exploration, because of large or infinite state spaces. Moreover, it is useful for a 
fast (approximate) stochastic characterization of CRNs, since solving the LNA 
is much faster than solving the CME [16]. We prove asymptotic correctness of 
LNA-based model checking and show on four examples that it is still possible to 
obtain very good approximations even for small population systems, comparing 
with standard uniformisation [24] and statistical model checking implemented 
in PRISM [25]. 

Related work. The closest work to ours is by Bortolussi et al. [4] , which uses 
the Central Limit Approximation (CLA) (essentially the same as the LNA) for 
checking restricted timed automata specifications, assuming a fixed population 
size. Wolf et al. m develop a sliding window method to approximately verify 
infinite-state CTMCs, which applies to cases where most of the probability mass 
is concentrated in a confined region of the state space. Recently, Finite State 
Projection algorithms (ESP algorithms) for the solution or approximation of the 
CME have been introduced [2S]. Both methods apply to the induced CTMC, but 
require at least partial exploration of the state space, and are thus not immune 
to state-space explosion. 

Structure of the paper. In Section [2] we summarise the deterministic 
and stochastic modelling approaches for CRNs, and in Section[3]we describe the 
Linear Noise Approximation method. Section [3] introduces the logic SEL and 
the corresponding model checking algorithm based on the LNA. In Section [5] we 
demonstrate our approach on four networks taken from the literature. Section [6] 
concludes the paper. 


2 Chemical Reaction Networks 


A chemical reaction network (CRN) C = (A, R) is a pair of finite sets, where 
A is the set of chemical species and R the set of reactions. |A| denotes the 
size of the set of species. A reaction r e i? is a triple t = {rr,PT,kr), where 
Ct^Pt G and kr G K>o- fr and Pr represent the stoichiometry of reactants 
and products and k^- is the coefficient associated to the rate of the reaction; 
its dimension is s“^. We often write reactions as Ai + A 3 — 2 X 2 instead of 
n = ([1, 0,1]”^, [0, 2, 0]”^, fci), where indicates the transpose of a vector. We 
define the net change associated to a reaction t hy Vr = Pt — t’t- For example, 
for Ti as above, we have = [— 1 , 2 , —1]^. 

We make the assumption that the system is well stirred, that is, the prob¬ 
ability of the next reaction occurring between two molecules is independent of 
the location of those molecules. We consider fixed volume V and temperature; 
under these assumptions a configuration or state x G of the system is given 
by the number of molecules of each species. We define [x] = the vector of 
the species concentration in x for a given N^ where N = V ■ is the volumet¬ 
ric factor, V is the volume of the solution and is Avogadro’s number. The 
physical dimension of N is Mol~^ ■ L, where Mol indicates mole and L is litre. 
Given Xi G A then ffXi-X G N represents the number of molecules of A^ in x and 
[Ai]_a; G M the concentration of Xi in the same configuration. In some cases we 
elide x, and we simply write ffXi and [Ai] instead of ffXi-X and [Ai]_a;. They are 
related by [Aj] = The dimension of [Ai] is Mol ■ L~^. 

The propensity of a reaction r in terms of the number of molecules is a 
function of the current configuration of the system x such that an,Tix)dt is the 
probability that a reaction event occurs in the next infinitesimal interval dt. In 
this paper we assume as valid the stochastic form of the law of mass action, so 
the propensity rates are proportional to the number of molecules that participate 
in the reaction [ 6 ]. Stochastic models consider the system in terms of numbers 
of molecules, while deterministic ones, generally, in terms of concentrations, and 
the relationship is as follows. For a reaction r = (rrjPrjkr), given the configu¬ 
ration X and the z-th component of Xr, then ac^rix) = k^ nll'i ([A.]-^)’’^'* 
is the propensity function expressed in terms of concentrations as given by the 
deterministic law of mass action. It is possible to show that, for any order of 
reaction, an,T{x) Ri Nac,T{x) if N is sufficiently large [I]. Note that ac,T is 
independent of N. In this paper we are interested only in finite time horizon, 
because of the problematic character of studying solutions of ODEs for infinite 
time horizon [3]. 

Example 1. Consider the CRN C = ({Ai, A 2 , A 3 }, i?), where R = {(Ai -I- A 2 — 

A 2 + A 2 ),(A 2 -I- A 3 — A 3 -I- A 3 )}, with initial conditions #Ai = 98, #A 2 = 
1 )#A 3 = 1, for a system with N = 1000. Figure [T] plots the expectation and 
standard deviation of population sizes. We may wish to check if the maximum 
expected value of )fX 2 remains smaller than 75 molecules during the first 2sec. 
However, the system is stochastic, so we also need to analyse whether the vari¬ 
ance is limited enough when 4 ^X 2 reaches the maximum. Sometimes, analysis 


of first and second moments does not suffice, so it could be of interest to check 
the probability of some events, for instance, is the probability that, between 
ti — O.bsec and ^2 = 1-Osec, #A 2 — (#Ai + > 0 greater than 0.6? 


Deterministic semantics. Let C = 

{A, R) be a CRN. The deterministic 
model approximates the concentration of 
the species of the system over time as a 
set of autonomous polynomial first order 
differential equations: 
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Func¬ 
tion <P : ]R>o ^ describes the be¬ 
haviour of the system as a set of deter¬ 
ministic equations assuming a continuous 
state-space semantics, therefore G 
is the vector of the species concen¬ 
trations at time t. Assuming to = 0, the 

initial condition is <?(0) = [xq], expressed as a concentration. Note that F{(I>{t)) 
is Lipschitz continuous, so exists and is unique [Hi- 

Stochastic semantics. CRNs are well represented by CTMCs, whose tran¬ 
sient analysis can be performed via the Chemical Master Equation (CME) [35] . 


Fig. 1: Expected number and stan¬ 
dard deviation of species of the CRN 
of Exampledjfor the given initial con¬ 
ditions, calculated by simulating the 
CME. 



Definition 1 Given a CRN C = {A, R) and the volumetric factor N, we de¬ 
fine a time-homogeneous CTMC \1H3S^ {X^{t)G G R>o) with state space S = 
Given xg G S, the initial configuration of the system, then P{X^{0) = 
xg) = 1. The transition rate from state xi to state Xj is defined as r{xi,Xj) = 

'l2{TeR\xj=xi-^-v^} F[ac,T{xi). 

X^{t) describes the stochastic evolution of molecular populations of each species 
at time t. Eor x G S, we define P^*\x) = P{X^(t) = a;|A(0) = xq), where xg is 
the initial configuration. The CME describes the time evolution of X^ as: 

^ (^P^*\x) ^ - Vr)p‘^*\x - Vr) - Nac,T{x)P^^\x).} (2) 

reR 

The CME can be equivalently defined in terms of the infinitesimal generator 
matrix which admits computing an approximation of the CME using, for 
examole. fast adaotive uniformisation |14I13| or the slidine window method m- 
We also define the CTMC (, t G K>o) with state space S = If 

[xo] G 5” is the initial configuration, then P{ — = [xq]) = 1. The transition 
rate from state [xj] to [xj] is defined as r([xi], [xj]) = i^reR\[xj\=[xi\+^} FtcXc,T{xi) 


















—is the random vector describing the system at time t in terms of concen- 
trations. In 11117) it is proved that lim sup ||—— ^(lO]ll = 0 almost surely 

N^cot><t 

for every time t. This explains the relationship between the two different seman¬ 
tics, where the deterministic solution can be viewed as a limit of the stochastic 
solution, valid when close enough to the thermodynamic limit. 

3 Linear Noise Approximation 

The solution of the CME can be computationally expensive, or even infeasible, 
because the set of reachable states can be huge or infinite. The Linear Noise 
Approximation (LNA) has been introduced by Van Kampen as a second order 
approximation of the system size expansion of the CME [3S]. Since stochastic 
fluctuations depend on iV, and specifically, for average concentrations, are of the 
order ot N 2 |T6l32| . to derive the expansion Van Kampen assumes that: 

X^{t)^N<P{t)+N^Z{t) (3) 

where Z{t) = {Zi(t), Z 2 {t),Z\j\\) is the random vector, independent of V, 
representing the stochastic fluctuations, (p{t) is given by the solution of Eqn ([IJ 
and (t) is the random vector of Definition [1] Using this substitution in the 
system size expansion and then truncating at the second order, the probability 
distribution of Z{t) is found to be given by the following linear Fokker-Plank 
equation m- 

i 

(4) 

where G'(^(t)) = is the j—th component 

of F{<P{t)). The solution of Eqn (|4]) gives a Gaussian process. For every time t, 
Z (t) has a multivariate normal distribution, whose expected value and covariance 
matrix are the solution of the following equations |16I21] : 

= Mm)Em)] (5) 

= M<p(t))C[Zit)] + C[Z{t)]J^^{<l>it)) + Gim) (6) 

where Jpi^it)) is the Jacobian of F(^(t)). We consider as initial conditions 
E[Z{0)] = 0 and C[Z{{))\ = 0. This means that E[Z{t)\ = 0 for every t. 

It is possible to justify the hypothesis ([3]) noting that in the lowest order the 
CME expansion reduces to Eqn o, and with the following theorem by Kurtz: 

Theorem 1. J77| / Consider the subset E C on which are defined the propen¬ 
sity functions a c,t ■ Let Z^{t) be the random vector given by Z^{t) = — 

Suppose that ^ \vt^\ sup acr(J^) < 00 for each compact K C E, and 
t&R x^k 

that, for N —>■ 00 , Z^ {0) = Z{0), then Z^{t) converges in distribution to Z{t). 


dP{Z,t) 

dt 


1^1 1^1 
i=l J = 1 


dFfi<l>{t))d{Z,P{Z,t)) , 1 


Al 1^1 


d<L>i 


dZ, 




i=i 3=1 


d^P{Z, 

dZ,dZ 















The LNA thus permits approximation of the probability distribution of X^{t) 
with the probability distribution of (t) = N<l>{t) +N^Z{t). It is easy to show 
that Y^{t) has a Gaussian distribution; indeed, Z{t) is Gaussian distributed, 
and N and are deterministic. 

To compute the LNA it is necessary to solve 0(|Ap) first order differential 
equations, but the complexity is independent of the initial number of molecules 
of each species. Therefore, one can avoid the exploration of the state space that 
methods based on uniformisation rely upon. 

Theorem [T] alone only guarantees convergence in distribution. However, in 
[.36] . LNA is derived as an approximation of the Ghemical Langevin Equation 
(CLE) [T5], rather than system size expansion. This shows that LNA is valid for 
every real chemical system close enough to the thermodynamical limit, at least 
for a limited time. Thus, LNA is exact in the limit of high populations, but can 
also be used for small populations if the behaviour is not too far from the deter¬ 
ministic limit, taking into account the continuous nature of the approximation 
and Gaussian assumptions on the noise [21136] . 

3.1 Probabilistic analysis of CRNs 

We have shown that can be approximated by Y^{t) = N<P(t) + N^Z{t), 
where Y^ (t) has a multivariate Gaussian distribution, so it is completely charac¬ 
terized by its expected value and covariance matrix, whose values are respectively 
E[Y^{t)] = N${t) and C[Y^{t)] = N^C[Z{t)]N^ = NC[Z{t)]. 

Since Y^ has a multivariate normal distribution then every linear combina¬ 
tion of its components is normally distributed. Therefore, given H = [ 6 i, 62 , • • ■ , &|yi|] 
where 61 , 62 ,..., &|yi| € we can consider the random variable BY^{t), which 
defines a linear combination of the species at time t. For every t, BY^{t) is a 
normal random variable, whose expected value and variance are 

E[BY^{t)] = BE[Y^{t)] (7) 

C[BY^{t)] = BC[Y^{t)]B^ (8) 

For a specific time tfc, it is possible to calculate the probability that BY^{tk) 
is within a set I of closed, disjoint real intervals [li,Ui], where li,Ui G K U 
{-boo, — 00 }. This probability is given by 

Ui 

nYEBj{tk)= Y. [ 9mBY^{tk)],C[BY^{tk)])dx (9) 

lii,ui]eii. 

where g{x\EV,a^) is the Gaussian distribution with expected value EV and 
covariance tr^. We recall that it is possible to find numerical solution of Eqn ([9|) 
in constant time using the Z table [30] . 

Example 2. Consider the CRN of Example (U then we can obtain the probability 
that #Ai — 2 #A 3 is at least 10 at time 20 by defining B' = [1,0,—2], /' = 
{[ 10 ,-boo]} and calculating f2YN b>j>{20). 




The following theorems are consequences of results in [3B], which can be gener¬ 
alized for reactions with a finite number of reagents and products. They show 
asymptotic pointwise convergence of expected value, variance and probability. 

Theorem 2. Let C = {A, R) be a CRN. Suppose the solution of Eqn (jS]) is 
bounded, then, approaehing the thermodynamic limit, for any finite instant of 
time ti ^ 

Jim WCyN (10) 

Af—s-oo 

where Cxn Bj{ti) is the probability that B{X^) is within I at time ti. 

Theorem 3. Suppose the solution of Eqn is bounded, then, approaching the 
thermodynamic limit, for any finite instant of time tk 

hm \\C[BY^{tk)] - C[BX^itk)]\\ = 0 (11) 

N—^oo 

hm \\E[BY^itk)]-E[BX^itk)]\\=0. (12) 

W—s-oo 

To solve the differential equations ® and ®, it is necessary to use a nu¬ 
merical method such as adaptive Runge-Kutta algorithm This yields the 
solution for a finite set of sampling times X = [ti, ...,t| 2 ;|] G where ti < 

■ ■■ X tk < ... < Jui and IHI is the sample size. Assuming Y^ is separable, that 
is, it is possible to completely define the behavior of Y^ by only considering a 
countable number of points, we can calculate ,b.i for ^my point in E and 
if points are dense enough then this set exhaustively describes the probability 
that BX^ is within I over time. This restriction is not a limitation since for any 
stochastic process there exists a separable modification of it [53]. 

4 Stochastic Evolution Logic (SEL) 

Let C = {A, R) be a CRN with initial state xq, in a system of size N. We now 
define the logic SEL (Stochastic Evolution Logic) which enables evaluation of 
the probability, variance and expectation of linear combinations of populations 
of the species of C. 

The syntax of SEL is given by 

?7 := I Q^v[B][tytPi I ??iA 772 | 771 V 7?2 

where Q = {supV,infV, supE,infE}, {<,>}, p G [0,1], v G R, B G 
d = {[li,Ui\ \ li,Ui gRU [- 1 - 00 ,- 00 ] A [li,Ui] n [lj,Ui] = 0, i ^ j} and [^1,^2] is 
a closed interval, with the constraint that ti < ^2 and ti,t 2 G M. If ti = <2 the 
interval reduces to a singleton. 

Eormulae t] describe global properties of the stochastic evolution of the sys¬ 
tem. {B, I) specifies a linear combination of the species of C and a set of intervals, 
where B G is the vector defining the linear combination and I represents 


a set of disjoint closed real intervals. is the probabilistic oper¬ 

ator, which specifies the probability that the linear combination defined by B 
falls within the range I over the time interval [^ 1 ,^ 2 ]- supE,infE,infV,supV 
respectively yield the supremum and infimum of expected value and variance of 
the random variables associated to B within the specified time interval. 

Example 3. Consider the CRN of Example[T] Checking if the variance of #Ai re¬ 
mains smaller than Ki within can be expressed as J[l, 0 , 0 ]][t^ 

Another example is checking if, in the same interval, (#Ai — #A 2 ) is at least K 2 
or within [K^^K 4 \, with < K 4 < K 2 , with probability greater than 0.95: 
R>o.95[[1, -1,0], {[K 3 , K 4 ], [K 2 ,oo])][t.^tk]- Equivalently, instead of writing B, we 
write directly the linear combination it defines. For example, in the latter case 
we have P>o. 95 [(#Ai - #A 2 ), ([ATa,-^ 4 ], [ATa, 

Semantics Given a CRN C = {A, R) with initial configuration xq in a system of 
fixed volumetric factor N, its stochastic behaviour is described by the CTMC 
of Definition [TJ We define a path of CTMC as a sequence w = xotiXitiX 2 --- 
where Xi is a state and A S K.>o is the time spent in the state Xi. A path is 
finite if there is a state Xk that is absorbing, w ® t is the state of the path at 
time t. Path{X^,xo) is the set of all (finite and infinite) paths of the CTMC 
starting in Xq. We work with the standard probability measure Prob over paths 
Path{X^,xo) defined using cylinder sets 

We first define when a path w satisfies {B, I) at time t 

uj,t \= [B, I) o 3[li,Ui] £ I .li < B(u} ®t) < Ui- 

Note that B{uj(i)t) is well defined because uj®t £ . We now define Pr^ j (t) = 

Prob{uj £ Path{X^,xo) \ uj,t\= (B,/)}, then if the time interval is a singleton 
the satisfaction relation for the probabilistic operator is 

X^,xo \= P^p[B,I]it^^ti] PrB,i{ti)'^P 

Instead, for ti < t 2 we have 

1 jy 

1= o -- - Pr^j{t)dtr^p 

C2 - Cl Jti 

Pr^’^j (t) is the probability of the set of paths of X^ such that the linear combi¬ 
nation of the species defined by B falls within /. It is well defined since we have 
previously defined the probability measure Prob on Path{X^,Xo). To define 
the satisfaction relation of the probabilistic operator we simply take the average 

■yN 

value of Prg j (t) during the interval \ti , ^ 2 ] ■ For the remaining operators the 
satisfaction relation is defined as 

X^,Xo ^ O sup{C[B{X^% [^ 1 ,^ 2 ]) ~ V 

X^,xo\=infV^y[B][ti,t 2 ] ^ 'i'rif{C[B{X^)],[ti,t 2 ])-^ 


V 



V 


,xq\= supEr^y[B\t^M\ ^ sup{E[B{X^)],[ti,t 2 ]) ^ 

X^,Xo h infE^-u[B][t^^t 2 ] ^ inf{E[B{X^)], [ti,t 2 ]) -- v 

X^,xo \= Pi Ar]2 ^ X^,Xo\=PiAX^,Xo\=P2 

X’^,xo \= pi'\/P2 ^ X^,xo \= pi'\/X^,Xo \= P2 

inf{-, [ti,t 2 ]) and sup{-, [ii,t 2 ]) respectively denote the infimum and supremum 
within [^ 1 ,^ 2 ]- 


4.1 LNA-based Approximate Model Checking for CRNs 

Stochastic model checking of CRNs is usually achieved by transient analysis 
of the CTMC X^ [^, which involves solving the CME and thus suffers from 
the state-space explosion problem. We propose an approximate model checking 
algorithm based on LNA. The inputs are a SEL formula ry, the stochastic process 
X^ induced by the CRN and initial state Xq. The output is true in case the 
formula is verified, and otherwise false. 

The algorithm proceeds by induction on the structure of formula 77, succes¬ 
sively computing whether each subformula is satisfied or not. We assume that 
Eqn (O and ([0]) are solved numerically where S is the finite set of sample points 
on which their solution is defined and that to, initial time, and tmax, final time, 
are always sampling points. 


Probabilistic operator. To evaluate Pr...p[{B, ^t 2 ] construct the func¬ 
tion Prob(^g j'f{t) = fiyN for t £ \ti,ti+i),ti,tiyi £ X (alternatively, can 

be constructed as the interpolation of the values of .bj over E points). 

Lemma 1. Prob(^B,i) *5 integrable onK>o. 

Theorem[2] guarantees the pointwise correctness of Prob(^Bj) ^-ad its integrability 
allows us to compute the following approximation, then compare to threshold p 
to decide the truth value. If t 2 ti then 7 ^ 3 ^ P'^'b^i (0 ~ t 2 -ti Iti PTobB,i{t)dt 

else if ti = t 2 then Pr^jiti) r; ProbB,i{ti). 

Expectation and variance operators To evaluate sup{C[B{X^)\,\ti,t 2 ]), 
inf{C[B{X^)\,[ti,t 2 \), sup{E[B{X^)],[ti,t 2 \) and m/(E[R(A^)], [ti,< 2 ]) we 
use the LNA, namely, compute the expected value and variance of Eqn m 
and 0 . Theorem [3] guarantees the quality of the approximation. We can now 
compute the following approximations, then compare to the threshold v: 

sup{C[B{X^)], [ti,t 2 ]) ^ max{C[BY^(tk)] \ {tk £ EAti <tk < t2)V(4 £ 

infiC[B{X^)], [ti,t 2 ]) ^ min{C[BY^(tk)] \ (tk £ EAE <tk< t 2 )V{tk £ 

and similarly for the expected value. L[ti,t 2 ] = € E A $tj £ E such 

that \ti — tj\ < \ti — tiW ensures that for any time interval there is at least 




one sampling point, even if the interval is a singleton. Note that, for each sub¬ 
formula, the algorithm involves the calculation of some quantity, so one can 
define a quantitative semantics for SEL as in m- 

LNA-based model checking can also be used for systems far from the thermo¬ 
dynamic limit, at a cost of some loss of precision. LNA assumes continuous state 
space, and it is not possible to justify this assumption for very small popula¬ 
tions. However, if the distributions of interest are not multi-modal and the noise 
term is finite and approximated by a Gaussian distribution, then LNA gives very 
good approximation even for quite small systems. It is clear that model checking 
accuracy increases as N grows. We emphasise that the model checking algorithm 
we have presented is also able to handle CRNs whose stochastic semantics is an 
infinite CTMC, which occur frequently in biological models. 

Complexity of LNA-based approximate model checking The time com¬ 
plexity for model checking formula 77 against a CRN C = {A, R) is linear in \r]\. 
In the worst case, analysis of a single operator requires the solution of 0(|Ap) 
polynomial differential equations for a bounded time. However, an efficient im¬ 
plementation can solve the 0(|Ap) ODEs only once for the interval [0, tmax], and 
then reuse this result for every operator, where tmax is the greatest (finite) time 
of interest. Note that ODEs are solved in terms of concentrations (a value be¬ 
tween 0 and 1 by convention), ensuring independence of the number of molecules 
of each species, although stiffness can slow down the solution of the LNA. 

5 Experimental Results 

We implemented the methods in a framework based on Matlab and Java. The 
experiments were run on an Intel Dual Core i7 machine with 8 GB of RAM. 
To solve the differential equations, we use Matlab ocie45, a variable step Runge- 
Kutta algorithm. We employ LNA-based model checking for the analysis of four 
biological reaction networks: a Phosphorelay Network [12], a Gene Expression 
Model |34I27] , the FGF pathway [22] and the GW network [ 8 ]. For every network, 
the CRN and parameters have been taken from the referenced papers. We coded 
the same CRNs in PRISM in order to compare accuracy and time of execution 
with standard uniformisation of the CME [24] and statistical model checking 
(SMC) techniques (confidence interval method) as implemented in PRISM. For 
the FGF and GW case studies, we cannot use global analysis nor SMC, because 
the state space is too large for direct analysis, and SMC requires many time- 
consuming simulations to obtain good accuracy. 

Phosphorelay Network. We consider a three-layer phosphorelay network 
whose structure is derived from [12]. Each layer (L1,L2,L3) can be found in 
phosphorylate form (Lip, L2p, L3p). We consider the initial condition #Llp = 
4j^L2p = = 0, i(fLl = ^L2p = 4I^Lip = Init, where Init € N. Then we 

analyse the ligand B, whose initial condition is = 3* Init. We are interested 
in checking the following SEL property: 

P>o. 7 [i#Llp — ^L3p), [0, -boo]] [0,100] A P>o.98[(#L3p — ^Llp), [0, -boo]] [300,000] 




which is verified if, in the first interval, the probability that jfLlp is greater 
than is > 0.7 and if, between 300 and 600, with probability > 0.98, 

^Lip is greater than ^L\p. We evaluate this formula in three different initial 
conditions, firstly Init = 32 and N = 5000, then Init = 64 and N = 10000, 
and finally Init = 100 and N = 15625, so the same concentration but different 
numbers of molecules. In all cases, the LNA-based model checking evaluates the 
formula as true. To understand the quality of the approximation, we check the 
following quantitative formula P,=7[(#L3p — #Llp, [0, +oo])][ 7 '_t] for T G [0, 600] 
(in our implementation =? gives the quantitity calculated by model checking 
the operator). We compare the results with the evaluation of the corresponding 
CSL formula using standard uniformisation (Unif) with error 10“^ |24j . The 
following table shows the results. MaxErr is the maximum error computed by 
LNA-based approach compared to standard uniformisation and AvgErr is the 
average error; Time{-) stands for execution time. 


Init 

Time (LNA) 

Time (Unif) 

MaxErr 

AvgErr 

20 

0.22 sec 

2 min 

0.0675 

0.0519 

32 

0.23 sec 

5 min 

0.059 

0.02 

64 

0.26 sec 

> 2 hr 

0.0448 

0.0027 

100 

0.3 sec 

> 2 hr 

0.03 

0.0011 


Note that as Init increases the error of our method decreases, while the execution 
time is practically independent of the molecular count. LNA-based algorithms 
are faster in all cases. Thus our approach can be used even for quite small 
population systems, giving a fast approximate stochastic characterization. 

Gene Expression. We consider a simple CRN that models the transcrip¬ 
tion of a gene into an mRNA molecule, and the translation of the latter into 
a protein. The CRN, rates and initial conditions are the same as in [27]. The 
stochastic semantics of the reaction network is an infinite CTMC, and we use 
this model to show that our method can handle infinite state-space processes. 
We consider the quantitative property supE^-fl^niRNA][ x,t]: which gives the 
number of molecules of mRNA in the system at time T. We compare our method 
with SMC estimation of the same property by using 50000 simulations, for 
T = {300,600,900,1200}, and in the following tables we compare the results 
in terms of execution time (Time(-)) and expected value of ^mRNA estimated 
{ExpVal{-)). LNA-based model checking is several orders of magnitude faster 
without loss of accuracy. 


T 

Time (LNA) 

Time (Simul) 

ExpVal (LNA) 

ExpVal (Simul) 

300 

0.52 sec 

75 sec 

100.17 

100.14 ± 0.1 

600 

0.54 sec 

198 sec 

142.15 

142.11 ± 0.1 

900 

0.54 sec 

337 sec 

159.73 

159.74 ± 0.1 

1200 

0.56 sec 

483 sec 

167.1 

167.1 ± 0.1 


FGF. We consider the model of Fibroblast Growth Factor (FGF) signalling 
pathway developed in [22] composed of more than 50 reactions and species. We 
consider the system with initially 105 molecules for species with non-zero initial 














concentration. Analysis of the model reveals that the phosphorylated form of 
FRS2 can bind the protein Src, and then this new complex, Src:FRS2, can re¬ 
locate out. We want to check if the expected value of #Src-.FRS2 during the hrst 
3000 seconds reaches a maximum value greater than 40. We do that by checking 
the property sMpi?>4o[#S'rc:Fi?5'2][o_3ooo]- The formula evaluates to true, and in 
Figure[2]we analyze the expected value and standard deviation of ^Src:FRS2. 
We obtain these values directly from the logic considering the quantitative in¬ 
terpretation of supE^^yfSrc'.FRS2]p’ rp] and supV-^y^Src.FRS2]p’ rp] T S 
[0, 3000]. It is possible to see that, after an initial peak, relocation causes expo¬ 
nential decay. 

In the same figure we show 
a single stochastic simulation of 
the system for the same initial 
conditions, confirming our evalu¬ 
ation. Moreover, the approxima¬ 
tion can be justified theoretically. 

^Src:FRS2 converges to zero nec¬ 
essarily and this demonstrates the 
unimodality of the distribution of 
the species; we note that the vari¬ 
ance is finite, so Eqn ([3]) holds. 

DNA strand displacement 
of GW network GW is a net¬ 
work related to the G2-M cell cycle 
switch [29) . Under particular ini¬ 
tial conditions, it has been shown 
that GW can emulate the Ap¬ 
proximate Majority algorithm [8]. 

Here, we consider the two-domain 
DNA strand-displacement imple¬ 
mentation of GW [?]■ The corre¬ 
sponding GRN is composed of 340 
species and 240 reactions. For our 
P, whose initial conditions are = 90 and = 10; initial conditions 
of other species are taken from the referenced papers. We check the property 
P>o. 9 [#i? — #H, [50, +CX)]] [ 6000 , 35000 ] for a system of size N = 45000, which is 
verified as true in 28 minutes. 

6 Concluding Remarks 

We presented a novel probabilistic logic for analysing stochastic behaviour of 
CRNs and proposed an approximate model checking algorithm based on the 
LNA of the CME. We have demonstrated on four non-trivial examples that LNA- 
based model checking enables analysis of CRNs with hundreds of species, and 
even infinite CTMCs, at a cost of some loss of accuracy. It would be interesting 



Fig. 2: Expected number and standard devi¬ 
ation of species of =f^Src:FRS2 in the FGF 
pathway during the first 8000 seconds esti¬ 
mated by our method is compared with a 
stochastic simulation of the same species. 

analysis the species of interest are R and 













to find bounds on the approximation error when the system is far from the 
thermodynamic limit. However, the error is not only dependent on the value 
of N, but also on the structure of the CRN, the rates, and the property. As 
future work, we plan to improve the accuracy of the method near critical points 
similarly to the approach of m. and to extend the logic with more expressive 
temporal operators in the style of CSL [5]. We also intend to release a software 
tool based on LBS m- 
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